Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

K8s-9372: Add modification for custom policy #57

Merged
merged 8 commits into from
Mar 28, 2024

Conversation

gizquierdo-s11
Copy link

What this PR does / why we need it:

This PR is required by K8s-9372 to add a new DeletePolicy that allows us to remove machines that are in a different status than "Ready" before checking the rest of the cases/situations, like newer or older machines.

Which issue(s) this PR fixes:
Fixes #K8s-9372

// CustomMachineSetDeletePolicy prioritizes both Machines that have the annotation
// "cluster.k8s.io/delete-machine=yes" and Machines that are unhealthy
// (Status.ErrorReason or Status.ErrorMessage are set to a non-empty value).
// If then prioritizes the machine with Status different from Ready.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe say explicitly that it prioritizes new machines over old ones for deletion

@@ -67,6 +76,34 @@ func oldestDeletePriority(machine *v1alpha1.Machine) deletePriority {
return deletePriority(float64(mustDelete) * (1.0 - math.Exp(-d.Seconds()/secondsPerTenDays)))
}


func customDeletePolicy(machine *v1alpha1.Machine) deletePriority {
// Iterate through conditions to get if machine is not "Ready"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we always return mustDelete, there's no real order to the machines with that value.
That's something we should eventually improve; for now maybe add a comment.

However, maybe reorder these checks already:

  • deletion timestamp non zero
  • machine has no nodeRef -> still provisioning
  • deletion annotation
  • node status NotReady
  • error status

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll "split" the NodeReady/Conditions report to make machine selection more granular.

pkg/controller/machineset/delete_policy.go Outdated Show resolved Hide resolved
return mustDelete
}
// If annotations are set + DeleteAnnotation is present
if machine.ObjectMeta.Annotations != nil && machine.ObjectMeta.Annotations[DeleteNodeAnnotation] != "" {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary nil check:

Suggested change
if machine.ObjectMeta.Annotations != nil && machine.ObjectMeta.Annotations[DeleteNodeAnnotation] != "" {
if v, ok := machine.ObjectMeta.Annotations[DeleteNodeAnnotation]; ok && v != "" {

pkg/controller/machineset/delete_policy.go Outdated Show resolved Hide resolved
Comment on lines 102 to 104
// Thinking about this, maybe it's a good idea to delegate in oldestDeletePriority, like
// newestDeletePriority is doing if no match is done, or just return couldDelete
return mustDelete - oldestDeletePriority(machine)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense like this.

pkg/controller/machineset/delete_policy.go Outdated Show resolved Hide resolved
@gizquierdo-s11
Copy link
Author

gizquierdo-s11 commented Mar 26, 2024

@squ94wk I think that with this changes we'll have the expected delete policy. The only change that is pending (and requires some extra changes on MC) is to get the updated status from Machine (not Node).

Currently, the machine can be powered off or Kubelet stopped, and the status still shows as "Ready":

Machine: test-mcontroller-65f466779d-x2mgb - Condition: {Ready True 0001-01-01 00:00:00 +0000 UTC 0001-01-01 00:00:00 +0000 UTC } - Status: True - Type: Ready
Machine: test-mcontroller-65f466779d-qj6hz - Condition: {Ready True 0001-01-01 00:00:00 +0000 UTC 0001-01-01 00:00:00 +0000 UTC } - Status: True - Type: Ready

Info: Machine qj6hz was off and Status/Type still reports the Ready: True when a downscale was triggered from UI from 2 machines to 1.

The conditions were put in expected order, and PR comments applied to improve code. Will check with some extra tests, but for now it looks fine.

pkg/apis/cluster/v1alpha1/machineset_types.go Outdated Show resolved Hide resolved
pkg/controller/machineset/delete_policy.go Outdated Show resolved Hide resolved
@squ94wk squ94wk merged commit 1316de5 into master Mar 28, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants